Fault - Tolerance in Multi - Computer Networks
نویسنده
چکیده
Multi-computers connected in various architectures are now commercially available and are being used for a variety of applications. Some of the most commonly used architectures are the binary hypercube, the binary tree, cube-connected cycles, the mesh, and multistage interconnection networks. All of these architectures have the major drawback that a single processor or edge failure may render the entire network unusable if the algorithm running on the network requires that the topology of the network is maintained. The failure of a single processor or a link between two processors would destroy the topology of these architectures. Thus, some form of faulttolerance must be incorporated into these architectures in order to make the network of processors more reliable. While several fault-tolerance schemes have been proposed for specific architectures, these schemes are not general enough to provide fault-tolerance for other architectures. The goal of this thesis is to provide a more general approach that can be applied to several of these multi-computer network architectures with only minor modifications. A general scheme for constructing fault-tolerant multi-computer networks is proposed which uses switching networks to inter-connect the processors of the network. Two such switching networks are described in the thesis. The scheme can be used to provide k fault-tolerance with k spare processors. It compares favorably with other proposed schemes for fault-tolerant multi-computer networks, achieving higher reliability while using at most the same amount of extra hardware. A fault-tolerant multi-computer network constructed using the proposed scheme functions as if it was a non-redundant network. No extra control information is needed to ensure the fault-tolerant network functions properly. When a processor fails, the reconfiguring process can be initiated distributively. Fast context switching is provided to speed up reconfiguration. These properties
منابع مشابه
FDMG: Fault detection method by using genetic algorithm in clustered wireless sensor networks
Wireless sensor networks (WSNs) consist of a large number of sensor nodes which are capable of sensing different environmental phenomena and sending the collected data to the base station or Sink. Since sensor nodes are made of cheap components and are deployed in remote and uncontrolled environments, they are prone to failure; thus, maintaining a network with its proper functions even when und...
متن کاملA fault tolerance routing protocol considering defined reliability and energy consumption in wireless sensor networks
In wireless sensor networks, optimal consumptionof energy and having maximum life time are important factors. In this article attempt has been made to send the data packets with particular reliability from the beginning based on AODV protocol. In this way two new fields add to the routing packets and during routing and discovering of new routes, the lowest remained energy of nodes and route tra...
متن کاملStability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid
Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...
متن کاملImproving Energy Consumption by Using Cluster Based Routing Algorithm in Wireless Sensor Networks
Multi-path is favorite alternative for sensor networks, as it provides an easy mechanism to distributetraffic, as well as considerate fault tolerance. In this paper, a new clustering based multi path routingprotocol namely ECRR (Energy efficient Cluster based Routing algorithm for improving Reliability) isproposed, which is a new routing algorithm and guarantees the achievement to required QoS ...
متن کاملEvaluating Multipath TCP Resilience against Link Failures
Standard TCP is the de facto reliable transfer protocol for the Internet. It is designed to establish a reliable connection using only a single network interface. However, standard TCP with single interfacing performs poorly due to intermittent node connectivity. This requires the re-establishment of connections as the IP addresses change. Multi-path TCP (MPTCP) has emerged to utilize multiple ...
متن کامل